fine tuning
Order-Independence Without Fine Tuning
The development of generative language models that can create long and coherent textual outputs via autoregression has lead to a proliferation of uses and a corresponding sweep of analyses as researches work to determine the limitations of this new paradigm. Unlike humans, these'Large Language Models' (LLMs) are highly sensitive to small changes in their inputs, leading to unwanted inconsistency in their behavior. One problematic inconsistency when LLMs are used to answer multiple-choice questions or analyze multiple inputs is order dependency: the output of an LLM can (and often does) change significantly when sub-sequences are swapped, despite both orderings being semantically identical. In this paper we present, a technique that guarantees the output of an LLM will not have order dependence on a specified set of sub-sequences. We show that this method provably eliminates order dependency, and that it can be applied to any transformer-based LLM to enable text generation that is unaffected by re-orderings.
FedAvg with Fine Tuning: Local Updates Lead to Representation Learning
The Federated Averaging (FedAvg) algorithm, which consists of alternating between a few local stochastic gradient updates at client nodes, followed by a model averaging update at the server, is perhaps the most commonly used method in Federated Learning. Notwithstanding its simplicity, several empirical studies have illustrated that the model output by FedAvg leads to a model that generalizes well to new unseen tasks after a few fine-tuning steps. This surprising performance of such a simple method, however, is not fully understood from a theoretical point of view. In this paper, we formally investigate this phenomenon in the multi-task linear regression setting. We show that the reason behind the generalizability of the FedAvg output is FedAvg's power in learning the common data representation among the clients' tasks, by leveraging the diversity among client data distributions via multiple local updates between communication rounds.
Fine Tuning Large Language Models for Medicine: The Role and Importance of Direct Parameter Optimization
Savage, Thomas, Ma, Stephen, Boukil, Abdessalem, Patel, Vishwesh, Rangan, Ekanath, Rodriguez, Ivan, Chen, Jonathan H
Large Language Model (LLM) fine tuning is underutilized in the field of medicine. Two of the most common methods of fine tuning are Supervised Fine Tuning (SFT) and Direct Parameter Optimization (DPO), but there is little guidance informing users when to use either technique. In this investigation, we compare the performance of SFT and DPO for five common natural language tasks in medicine: Classification with text data, Classification with numeric data, Clinical Reasoning, Summarization, and Clinical Triage. We find that SFT alone is sufficient for Classification with text data, whereas DPO improves performance for the more complex tasks of Clinical Reasoning, Summarization and Clinical Triage. Our results establish the role and importance of DPO fine tuning within medicine, and consequently call attention to current software gaps that prevent widespread deployment of this technique.
- North America > United States > California > Santa Clara County > Stanford (0.05)
- Europe > Lithuania > Kaunas County > Kaunas (0.04)
- Asia > India > Gujarat (0.04)
UniTune: Text-Driven Image Editing by Fine Tuning a Diffusion Model on a Single Image
Valevski, Dani, Kalman, Matan, Molad, Eyal, Segalis, Eyal, Matias, Yossi, Leviathan, Yaniv
Text-driven image generation methods have shown impressive results recently, allowing casual users to generate high quality images by providing textual descriptions. However, similar capabilities for editing existing images are still out of reach. Text-driven image editing methods usually need edit masks, struggle with edits that require significant visual changes and cannot easily keep specific details of the edited portion. In this paper we make the observation that image-generation models can be converted to image-editing models simply by fine-tuning them on a single image. We also show that initializing the stochastic sampler with a noised version of the base image before the sampling and interpolating relevant details from the base image after sampling further increase the quality of the edit operation. Combining these observations, we propose UniTune, a novel image editing method. UniTune gets as input an arbitrary image and a textual edit description, and carries out the edit while maintaining high fidelity to the input image. UniTune does not require additional inputs, like masks or sketches, and can perform multiple edits on the same image without retraining. We test our method using the Imagen model in a range of different use cases. We demonstrate that it is broadly applicable and can perform a surprisingly wide range of expressive editing operations, including those requiring significant visual changes that were previously impossible.
- Asia > Middle East > Israel (0.05)
- North America > United States (0.04)
ExtPerFC: An Efficient 2D and 3D Perception Hardware-Software Framework for Mobile Cobot
Dang, Tuan, Nguyen, Khang, Huber, Manfred
As the reliability of the robot's perception correlates with the number of integrated sensing modalities to tackle uncertainty, a practical solution to manage these sensors from different computers, operate them simultaneously, and maintain their real-time performance on the existing robotic system with minimal effort is needed. In this work, we present an end-to-end software-hardware framework, namely ExtPerFC, that supports both conventional hardware and software components and integrates machine learning object detectors without requiring an additional dedicated graphic processor unit (GPU). We first design our framework to achieve real-time performance on the existing robotic system, guarantee configuration optimization, and concentrate on code reusability. We then mathematically model and utilize our transfer learning strategies for 2D object detection and fuse them into depth images for 3D depth estimation. Lastly, we systematically test the proposed framework on the Baxter robot with two 7-DOF arms, a four-wheel mobility base, and an Intel RealSense D435i RGB-D camera. The results show that the robot achieves real-time performance while executing other tasks (e.g., map building, localization, navigation, object detection, arm moving, and grasping) simultaneously with available hardware like Intel onboard CPUS/GPUs on distributed computers. Also, to comprehensively control, program, and monitor the robot system, we design and introduce an end-user application. The source code is available at https://github.com/tuantdang/perception_framework.
- North America > United States > Texas (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
Taught by the Internet, Exploring Bias in OpenAIs GPT3
Ayaz, Ali, Nawalgaria, Aditya, Yin, Ruilian
This research delves into the current literature on bias in Natural Language Processing Models and the techniques proposed to mitigate the problem of bias, including why it is important to tackle bias in the first place. Additionally, these techniques are further analysed in the light of newly developed models that tower in size over past editions. To achieve those aims, the authors of this paper conducted their research on GPT3 by OpenAI, the largest NLP model available to consumers today. With 175 billion parameters in contrast to BERTs 340 million, GPT3 is the perfect model to test the common pitfalls of NLP models. Tests were conducted through the development of an Applicant Tracking System using GPT3. For the sake of feasibility and time constraints, the tests primarily focused on gender bias, rather than all or multiple types of bias. Finally, current mitigation techniques are considered and tested to measure their degree of functionality.
- North America > United States > New York > New York County > New York City (0.14)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
- South America > Brazil (0.04)
- (7 more...)
- Education (1.00)
- Law > Civil Rights & Constitutional Law (0.93)
- Information Technology (0.67)
- Government (0.67)
Fine Tuning with Abnormal Examples
Given the prevalence of crowd sourced labor in creating Natural Language processing datasets, these aforementioned sets have become increasingly large. For instance, the SQUAD dataset currently sits at over 80,000 records. However, because the English language is rather repetitive in structure, the distribution of word frequencies in the SQUAD dataset's contexts are relatively unchanged. By measuring each sentences distance from the co-variate distance of frequencies of all sentences in the dataset, we identify 10,500 examples that create a more uniform distribution for training. While fine-tuning ELECTRA [4] on this subset of examples reaches better performance to a model trained on all 87,000 examples. Herein we introduce a methodology for systematically pruning datasets for fine tuning reaching better out of sample performance.
How to use Transfer Learning Types and Usecases.
If you spent some time in creating machine learning models and deep learning models then you must have heard of Transfer Learning. Well the name itself tells everything. Transfer Learning is a technique where we use a pre trained model and either we only replace the end layers from the neural network with ours or we train some layers to get an optimum result with less amount of training time and resources. If you want to see how i used transfer learning you can check out this notebook (Click Here To Visit Github) where i used VGG16 pretrained model for Dog vs Cat Dataset and created a Model using two ways of transfer learning or just stick to the article as i am going to explain in much detail how transfer learning works. "Do the Smart Work Not Hard Work" For those who don't know what is VGG16 its is a convolutional neural network that is 16 layers deep which is trained on Image Net Dataset.
Federated Continual Learning through distillation in pervasive computing
Usmanova, Anastasiia, Portet, François, Lalanda, Philippe, Vega, German
Federated Learning has been introduced as a new machine learning paradigm enhancing the use of local devices. At a server level, FL regularly aggregates models learned locally on distributed clients to obtain a more general model. Current solutions rely on the availability of large amounts of stored data at the client side in order to fine-tune the models sent by the server. Such setting is not realistic in mobile pervasive computing where data storage must be kept low and data characteristic can change dramatically. To account for this variability, a solution is to use the data regularly collected by the client to progressively adapt the received model. But such naive approach exposes clients to the well-known problem of catastrophic forgetting. To address this problem, we have defined a Federated Continual Learning approach which is mainly based on distillation. Our approach allows a better use of resources, eliminating the need to retrain from scratch at the arrival of new data and reducing memory usage by limiting the amount of data to be stored. This proposal has been evaluated in the Human Activity Recognition (HAR) domain and has shown to effectively reduce the catastrophic forgetting effect.
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)